All Questions
Tagged with algorithmsbigdata
10 questions
5votes
2answers
3kviews
SGDClassifier fit and partial_fit functions
I wanted to know what is the correct way to train the SGDClassier model on new data observations? Should I use the fit function or the ...
2votes
1answer
72views
Are there deduplication algorithms that do not work on a metric space?
Recently I got interested in the process of data cleansing and specifically in record linkage. Thus far I read about deterministic and probabilistic approaches to deduplicate data sets and to some ...
6votes
0answers
118views
Fixed-radius range search in non-Euclidean space
I'm trying to find an indexing data structure most suitable for my metric space: set of IP network related data (IP addresses, ports, TCP flags, ...), distance function is continuous, non-Euclidean ...
0votes
2answers
109views
Data Science Companies [closed]
I'm interested in data science market. I was expecting that there would be a lot of companies who are making algorithms and models for companies like in kaggle competitions. But i struggle to find any....
2votes
3answers
2kviews
Finding outliers in multiple dimensions
I'm working on a dataset which isn't normally distributed. The dataset contains three dimensions like cost, discount and profit. I'm trying to find possible outliers in all these dimensions. I used ...
4votes
3answers
5kviews
How to explain decision tree algortihm in layman's terms?
I have a task at hand, where I have to explain decision tree algorithm to a person who has not much understanding of ...
2votes
1answer
266views
Time Complexity notation in Big Data platforms
I am redesigning some of the classical algorithms for Hadoop/MapReduce framework. I was wondering if there any established approach for denoting Big(O) kind of expressions to measure time complexity? ...
2votes
1answer
2kviews
Optimizing Weka for large data sets
First of all, I hope I'm in the right StackExchange here. If not, apologies! I'm currently working with huge amounts of feature-value vectors. There are millions of these vectors (up to 20 million ...
1vote
4answers
6kviews
Small project ideas for Machine Learning [closed]
I need some serious help. I am supposed to implement a project (Non-Existing as of now) for my Machine Learning course. I have no basics in AI or Data mining or Machine learning. I have been searching ...
10votes
2answers
2kviews
Scalable Outlier/Anomaly Detection
I am trying to setup a big data infrastructure using Hadoop, Hive, Elastic Search (amongst others), and I would like to run some algorithms over certain datasets. I would like the algorithms ...